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PROCESSING PROCEDURE FOR AN ELECTRONIC SYSTEM SUBJECT 
TO TRANSIENT ERROR CONSTRAINTS AND A MEMORY ACCESS 

MONITORING DEVICE 
DESCRIPTION 

Technical field 

The invention relates to a processing procedure 
for an electronic system subject to transient error 
constraints and a memory access monitoring device, for 
example for use in space. 

State of prior art 

The process according to the invention relates to 
all computer architectures subject to transient errors. 
For example, the following fields use computers subject 
to disturbing environments for electronic components 
(radiation, electromagnetic disturbances) that could 
generate this type of error: 

- space, nuclear and aeronautical industries, in 
which the environment includes heavy ions, 

- automobiles, subject to a severe electromagnetic 
environment . 

The space industry is used as an example 
throughout the rest of the description, because it is 
very representative of random transient errors 
generated on electronic components, and because this is 
the field in which the process according to the 
invention was initially developed and evaluated. 

Designers of computer architectures for satellites 
are faced with the problem of radiation that exists in 
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space but that is filtered by the earth's atmosphere. 
This radiation may have a "singular event" effect that 
causes temporary state changes of bits in memory 
components, in internal registers of microprocessors or 
5 in other integrated components. For example, errors 
generated by these singular events may generate 
incorrect data, for example a bad control of a 
satellite actuator or a serious disturbance to the 
software sequence, for example by crashing a 
10 microprocessor. 
£y Up to now, the solution for singular event type 

y errors was to use integrated circuit technologies 

ijj referred to as "radiation tolerant" technologies that 

^ were not very sensitive to this phenomenon, or 

o 15 "radiation hardened" technologies that are insensitive 

;5 to it. This type of technology that is not used in 

industrial microelectronics was developed specifically 
i= for military and space applications. 

The global cost associated with the existence of 
20 these microelectronic technologies and the development 
of components using these technologies, and therefore 
the selling cost of these components, is very high. 
The ratio of the cost between a hardened circuit and a 
commercial circuit may be 100 or more. 
25 The market share of "high reliability" military 

components has dropped sharply from 80% in the 1960s to 
less than 1% in 1995. Starting from 1994, the American 
Department of Defence reduced the use of military 
electronic components for its applications and 
30 accelerated the process of increased use of commercial 
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specifications / standards / components for military 
activities . 

As described in document ref . [1] at the end of 
this description, the use of commercial electronic 
5 components has become a challenge that the space 
industry needs to face. 

The use of commercial components in space 
applications is a problem that all new generation 
projects face. A major problem to be solved is then 

10 the sensitivity of these components to radiation, and 
particularly to heavy ions; this aspect which was 
previously treated at "component" level, then needs to 
be solved at the "architecture" and "system" levels. 
As described in the two documents reference [2] and [3], 

15 the satellites and therefore their onboard electronics 
are subjected to a radiation environment composed of 
different particles (electrons, heavy ions, protons), 
that are not applied to systems on the ground since 
these particles are filtered by the atmosphere. 

20 These particles may be due to: 

- cosmic radiation originating partly outside the 
galaxy, and partly within the galaxy composed of 
extremely high energy ions , 

- radiation belts composed of trapped electrons 
25 and protons generated subsequent to interactions 

between the earth's atmosphere and solar particles, 

- solar eruptions that emit protons or heavy ions, 

- the solar wind generated by the evaporation of 
coronal plasma, allowing low energy protons and ions to 

30 escape from the gravitation pull of the sun. 
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These high energy particles strike and pass 
through an electronic component, and transfer part of 
their energy to it and thus disturb its normal 
operation. These problems are called "singular events" 
5 created by heavy ions and protons . 

These singular events correspond to the generation 
of errors in cells that memorize binary values, and 
cause bit errors. As a general rule, a single bit is 
modified by a heavy ion. These events are not 

10 destructive and new data can be written afterwards; 
the new data are memorized without errors, unless 
another singular event occurs in the same cell. This 
is why the term "transient fault" will be used to 
characterise errors generated by these phenomena 

15 throughout the rest of this description. 

As already mentioned above, the manufacture of 
onboard electronics on satellites is usually achieved 
using components insensitive to radiation, either 
because they are specially made for this purpose or due 

20 to the selection of components not specifically 
manufactured for this purpose. 

One first possibility for using commercial 
components in space on a large scale, is to make a 
selection by testing commercial components under 

25 systematic radiation. This method would firstly be 
very expensive in terms of selection, but also would 
not be efficient because it would not necessarily be 
possible to use large industrial standards, although 
this would be desirable. 
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Another economically more attractive possibility 
would be to reduce constraints on the choice of 
components- This would consist of finding methods by 
which phenomena generated by radiation could be 
5 tolerated, and particularly transient errors, in other 
words defining architectures by which errors could be 
detected and then corrected. Transient faults would 
then be taken into account at the "architecture" and 
"system" levels instead of at the "component" level. 

10 Documents reference [4], [5] and [6] describe a set 

of fault detection, isolation and recovery mechanisms. 
Some mechanisms are used simply to detect errors, 
others to detect them and then mask them, and others to 
correct them. Furthermore, these mechanisms are 

15 adapted to the processing of temporary faults, or 
permanent failures, or both. 

A brief reminder of the usual mechanisms is given 
below, with a few example applications applicable 
particularly to the space industry: 

20 - Avoidance of faults: systematic refreshment of 

static data before they are actually used; "off-line" 
self -tests (not during nominal operation) in order to 
detect a component failure before the component is 
used. 

25 - Error detection or detection / correction codes 

applicable to memories, communications and possibly the 
logic, mainly for the manufacture of Application 
Specific Integrated Circuits (ASICs) or Field 
Programmable Gate Arrays (FPGA) with integrated 

30 control. Error Detection And Correction (EDAC) 
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circuits are systematically used in space for memory 
purposes. A systematic rereading (or "scrubbing") 
function of the entire memory is associated with these 
circuits and runs as a scrub task in order to avoid the 
5 accumulation of dormant errors which would eventually 
make detection / correction impossible. 

- Duplication and comparison, or triplication and 
majority vote ( "N Modular Redundancy", modular 
redundancy of order N ) . These mechanisms can give fail 

10 safe architectures when a failure occurs which will not 
generate a bad command but which will stop at the first 
fault (duplex), or architectures that remain 
operational ("fail operational") during a failure, that 
have the ability to mask a single error in real time 

15 and continuing while remaining "safe" (triplex). This 
class also contains master / controller architectures 
in which only the microprocessors are duplicated, the 
data output from the "master" then being verified by 
the "controller"; the ERC-32 microprocessor made by 

20 the MHS S.A. company includes such a mechanism. 

- Multiple programming method ( "N-version 
programming" ) associated with modular redundancy 
architectures of order N, that are also capable of 
detecting software design errors. Each computer is 

25 provided with a software version that was developed 
specifically starting from a common specification. 

- Time redundancy; the objective is either to use 
two successive executions followed by a comparison, or 
a single execution followed by loading a command 

30 register and then rereading it in order to make a 
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comparison and a validation, such as the "arm then 
fire" mechanism used in space for very critical 
commands , for example triggering pyrotechnic elements . 

- Check of the execution time; "watchdogs" (time 
5 counters that verify that a program is executed within 

a limited time) are used in all space computers. 
Furthermore, these more detailed checks on the 
execution time may be built into the software; 
checking the duration of a task, maximum allowable 
10 duration to obtain a reply from communication elements, 
etc. Software is also used to set checks on the task 
execution time. 

- Verification of the control flow, for example 
checking the sequence of a microprocessor. Watchdogs 

15 enable a coarse check - they can detect a hard disk 
crash. An end of instruction flow check can be made 
with a more or less complex monitoring processor. A 
check using the signature analysis is particularly 
efficient and does not require much electronics. This 

20 concept was built into the ERC-32 made by the MHS S.A. 
company, but a specific compiler that calculates 
reference signatures and incorporates them into the 
code was necessary to make it transparent to the user. 

- Check the validity of a microprocessor address 
25 starting from access rights by page / segment. 

- Probability check: this principle is used in 
Attitude and Orbit Control Systems (SCAO) for 
satellites, in which data from several types of sensors 
are compared to detect any inconsistencies, or one item 

30 of data is compared with an estimated reference using a 
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prediction filter on the previous values, or one item 
of data is compared with a predefined range- "Fault 
tolerance based on algorithmic processing" methods 
represent a sub-class of probability checks, the 
5 verification being based on execution of a second 
algorithm, for example a reverse algorithm that will 
reproduce the initial results starting from the results 
obtained if they are error free. 

- Structural or semantic check of data requiring 
Q 10 relatively complex data structures. 

m - Complementary error recovery concepts, mainly 

'■j restart points for which the mechanisms described above 

are incapable of correcting faults; regular backup of 
H 1 contexts and restart from the last saved context. 

q 15 - Another means of error recovery is to reinsert a 

resource with a fault by transfusion of a healthy 
context into a defective computer in order to restore 
M= the initial detection / correction capability. 

Known documents also include descriptions of time 
20 redundancy. 

Document reference [5] describes the possibility of 
executing a task three times in sequence and "voting" 
the result. 

The possibility of carrying out an order N modular 
25 redundancy type operation by software is also mentioned 
theoretically in document reference [6]. 

in this document, another method is described for 
discriminating transient faults from permanent faults, 
and possibly for correcting them. Detection is not a 
30 time redundancy method, but for example may consist of 



B13049 DB 



9 



data coding. If a detection is made, the processing is 
done a second time; if the second execution gives 
error-free results, then the error was transient and 
there is no point in reconfiguring the system; 
5 otherwise, a reconfiguration is necessary since it is a 
permanent fault. 

In both documents, time redundancy is measured as 
being theoretically possible, but no information is 
provided about the possibility of achieving this in 

10 practice, and no specific developments are mentioned. 
Some problems are not even considered; in particular, 
should the vote be made by the microprocessor itself or 
should it be made by an external device independent of 
the microprocessor. The result of the vote needs to be 

15 robust because it is a decision-making element, 
although a malfunction can occur in the microprocessor, 
such as data error, crashing of the sequence, etc. 
Therefore, the vote made by the microprocessor is a 
major element that is not considered. Furthermore, the 

20 granularity on which detection is based is not defined. 

Document reference [7] is slightly more specific. 
It describes a comparative evaluation of two error 
detection methods. One of them is called the "modular 
triple software redundancy". The modular triple 

25 redundancy is normally performed in hardware. The 
method evaluated in this publication uses time 
redundancy by successive execution of the software and 
all modules, and particularly the vote module, are done 
by software and are executed on the same 

30 microprocessor. Therefore this is a purely software 
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approach. FIGURE 1, that corresponds to FIGURE 1 in 
this document, illustrates how detection works; each 
of three vote modules 1, 2 and 3 compares the results 
of executing three procedures 4 , 5 and 6 , and the three 
5 modules are followed by a decision making stage 7 that 
compares the result output from each of the vote 
modules to check the consistency of the three 
processing steps. The modular triple software 

redundancy is programmed on an MC68000 microprocessor. 

10 About 1500 errors were injected to validate this 
software. The memory is not protected from errors by 
an error detection and correction circuit. This 
document concludes that the only errors that cannot be 
detected are errors that will make one processing 

15 disturb another. Furthermore, not all errors affecting 
communications between programs are tolerated. 

This document mentions a specific example of an 
architecture in which the processing is executed three 
times consecutively (time redundancy), and in which the 

20 vote module is executed three times, the results of the 
vote modules then being voted themselves. It can be 
seen that the vote is in no way secure, and that is why 
it has to be triplicated. The final decision is then 
made by the last stage illustrated in the figure, which 

25 is indirectly made secure by the fact that it is 
necessarily very small (only a few lines of data are 
necessary to vote three items of data) : statistically, 
singular events directly affecting this module are 
negligible, but this does not provide security against 

30 microprocessor sequencing errors. 
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Document reference [8] also presents a "triple software modular 
redundancy" implementation on an iAPX 432 type microprocessor, which is an 
embodiment similar to that presented in document reference [7]? each software 
sub-task is executed three times consecutively, and then a software vote module 
is executed three tines, consequently it is not secure since these tasks are 
carried out asynchronously on the same microprocessor. An error injection 
phase demonstrated propagations of errors between sub-tasks, which tends to 
show that there is no barrier to errors between different sub-tasks. 

An estimate of the rate of singular events was made for a typical 
computer for use in space; obviously, this rate depends on assumptions such as 
the number of memory cells and the value of the sensitivity of a unit cell 
used. A simulation of the criticality 

Document reference [9] gives a general overview of the processing 
procedure for the electronic or digital system subject to transient error 
constraints and mentions spatial redundancy and time redundancy, as processing 
means. 

Document reference [10] also divulges a processing procedure for an 
electronic system subject to errors, the said system suggesting the use of a 
single physical sequence in order to avoid the use of redundant sequences. 
Document reference [11] describes processes applied to recent microprocessors to 
enable memory management and a virtual memory. An access principle limited to 
information is considered. Some access rights to a page or a segment are given 
to each process, these access rights being controlled in real time, of errors 
on the management of the attitude of a satellite was also made considering a 
bad command generated on a medium critical actuator in an attitude and orbit 
control system, for example a reaction wheel. It is then found that the 
singular events rate is low, but not sufficiently low so that this phenomenon 
can be neglected with regard to two types of controls: 

- the most critical controls: pyrotechnics, propulsion units, battery 
management, etc. The risk of losing a satellite several times per year cannot 
be accepted; 

- medium critical controls: reaction wheels, magneto-couplers, etc. 
Some missions, and particularly 
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telecommunications) are not compatible with attitude 
disturbances that could be generated by these errors, 
even if these errors remain limited. 

Furthermore, the frequency of singular events is 
very much greater than the frequency of the computer 
real time cycle. 

The purpose of the invention is to propose a 
processing procedure for an electronic system subject 
to transient error constraints in order to use 
commercial components despite their sensitivity to 
singular events, making it possible to detect the 
appearance of transient errors and to correct them. 

Presentation of the invention 

This invention relates to a processing procedure for an 
electronic system subject to transient error constraints, for 
example in the space industry, characterised in that two virtual 
sequences installed on a single physical sequence are 
multiplexed in one given real time cycle (the data resulting 
from each execution of a virtual sequence being stored so that 
they can be voted before use ) , and in that if an error is 
detected, the real time cycle in progress is inhibited and a 
healthy context is reloaded to make a restart that consists of a 
nominal execution of the next cycle starting from the reloaded 
context. 

Thus, the error correction is made by reloading a healthy 
context, in other words the context calculated during the real 
time cycle that precedes the cycle in 
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which the error was detected, then by nominal execution (in 
other words repeating new acquisitions) of the cycle that 
follows the cycle in which the error was detected starting from 
the restored context; this type of correction is characterised 
5 by the appearance of a "hole" in a real time cycle in execution 
of the software (in which the error was detected). 

Advantageously, the following characteristics are also 
possible. 

There are three possible error confinement areas (time, 
10 software and hardware); time confinement of errors cannot be 
propagated from one real time cycle to another ; software 
confinement of errors cannot be propagated from one software 
task to another or from one virtual sequence to another; 
hardware confinement of errors prevents errors occurring in the 
15 acquisition electronics or in the control unit from being 
propagated into the control electronics (no generation of false 
commands) . 

A memory plane in the control unit, protected from 
singular events by an error detection and correction code, can 

2 0 also be used. 

The selected detection / correction granularity may also 
be the operational cycle of software tasks running on the 
computer, which can very much reduce the constraints added by 
the "backup context" function that is activated regularly, and 

25 the "restore context" function activated at the time of an error 
correction, compared with usual solutions known to an expert in 
the subject in that the number of variables belonging to the 
context is reduced 
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to the strict minimum at the boundary between two real 
time cycles. 

The "backup context" function is activated 
regularly and may be achieved by an index change, 
5 offering the advantage that this function has almost no 
impact on the development cost of the software or on 
the execution time of this function by the 
microprocessor; the only impact is in the use of this 
function to copy context variables with a life 

10 exceeding the detection / correction granularity, i.e. 
the real time cycle. 

The "restore context" function activated during an error 
correction may be achieved using the fact that the index 
indicating the context considered to be healthy, in other words 

15 error free, in the previous real time cycle must not be swapped, 
whereas usually (in other words when no error is detected) it 
would be swapped; this "no swap" is inherent to inhibition of 
the real time cycle in which the error is detected, thus 
providing the advantage that this function has no impact on the 

20 development cost of the software or on the execution time of 
this function by the microprocessor , which is not usual in 
solutions typically known to the expert in the subject. 

A segmentation of the memory associated with a specific 
access rights checking device can be used, this device allowing 

25 different and arbitrary segment sizes. This hardware device for 
checking access rights can enable several access configurations, 
each configuration allowing access to one or several non- 
contiguous segments. This hardware device for checking 
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access rights also enables a selection of access 
configurations according to the logical combinations of 
one or several keys . 

Variables /data to be voted may be spread out flat 
5 in order to obtain a simple voting module that can be 
reused in different applications; in this case, the 
voting module has a negligible influence on the 
software development cost. 

A software vote for which the integrity is 
10 achieved by software checks can be used, particularly 
including a software monitoring processor and hardware; 
the vote also triggers authorization of transfers to 
the control electronics if no errors are detected. 

Finally, a check can be made of transfers to the 
15 control electronics by a hardware device satisfying 
access rights and limiting the validity time of this 
transfer (time validation window), thus delimiting a 
hardware error confinement area. 

Thus the process according to the invention 
20 includes: 

- Duplication of the execution of tasks in time, 
and a vote on the tables produced; 

- Detection of all data errors due to the tables 

vote; 

25 - Detection of sequencing errors due to hardware 

and software security devices for the vote and the 
check of access rights; 

- Correction transparent to the application: 
everything takes place as if there were a "hole" in a 

30 real time cycle. 
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Minimisation of specific developments is one of 
the advantages of the process according to the 
invention, namely: 

- For software: 

5 • grouping of variables voted in tables, 

• the "Vote and generation of commands" module 
which is usually reusable in different applications, 

• management of the process according to the 
invention (management of time duplication, hardware 

10 devices for monitoring of memory accesses and the time 
validation window, and error correction). 

- For hardware : 

• monitoring of memory accesses and the time 
validation window; these are simple components that 

15 have to be integrated into an FPGA circuit, or (better) 
into the ASIC circuit usually associated with the 
microprocessor (address decoding, etc.)/ and they are 
also reusable in different applications; 

• protection of the electronics of critical 
20 commands by usual fault tolerant mechanisms (for 

example instrumentation). 

Therefore, the process according to the invention 
has the following advantages: 

- Very little hardware development, 
2 5 - Very little software development, 

- Minimisation of recurrent costs (only one 
computer) compared with other fault tolerant 
architectures , 
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these characteristics not being frequently found in the 
fault tolerant applications. 

Furthermore, the use of "commercial" components in 
the space industry has the following main advantages: 
5 - It solves the problem of the observed reduction 

in the availability of "high reliability" components, 
because the major suppliers no longer supply this 
market . 

- Reduction in costs, an aspect stimulated by the 
10 budget context. The "high reliability components" item 

is not negligible in the total development cost of the 
equipment, and becomes overriding in its recurrent 
cost . 

- Use of higher performance functions / 
15 components in order to reduce the volume of the 

electronics and / or increase functionalities. 

- Reduce the development time of projects to offer 
more reactive access to space, the procurement time for 
"high reliability" components typically being one or 

20 two years. 

Advantageously, the process according to the 
invention has a generic purpose and may be used in all 
types of computers subject to transient error 
constraints regardless of the origin of these errors 

25 (cosmic radiation, electromagnetic pulse, etc.), but it 
is quite naturally applicable to the space field. 

This invention also relates to a memory access 
monitoring device (SAM) in a computer, particularly 
including a control unit made around a microprocessor, 

30 and a memory characterised in that the memory is 
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partitioned into segments and in that each segment has an 
access right defined by a logical function of all or some of the 
keys available in the device, the access right to each segment 
being controlled in real time. Some segments have authorized 
5 access only if there is a very good probability that the 
microprocessor will be in a good operating condition, thus 
allowing safe storage of critical data (for example context 
data) . 

Advantageously, depending on the programming of available 
1 0 keys in the device , a set of non-contiguous segments is 
accessible in read only for some segments, and in read / write 
for other segments. 

Advantageously, the size of the segments is arbitrary so 
that it can be optimised for a given application. 
15 Advantageously, the definition of the set of available 

keys, the logical functions for combination of these keys and 
the configuration of segments accessible as a function of the 
programming of the keys, are specific. 

It is also possible to define the specific features of 
20 this device related to the specific definition of the keys, for 
example: 

- one of the segments has a write authorisation 
accessible as a function of an exceptional state of the 
computer, thus allowing safe storage of critical data (for 

2 5 example the code), 

- segments enabling safe storage of critical data are 
grouped in pairs ("old" segment and "new" segment) (working in 
flip-flop). < 
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Brief description of the drawings 

• Figure 1 illustrates a software modular triple 
redundancy according to known art, 

5 • Figure 2 illustrates the block diagram for the 

hardware architecture used as a reference in this 
description, 

• Figure 3 illustrates the time diagram of the 
reference software architecture, 

10 • Figure 4 illustrates the sequencing of the 

reference architecture, 

• Figures 5A and 5B illustrate the global operating 
sequence: FIGURE 5A illustrates the procedure without 
using it, and FIGURE 5B illustrates the procedure using 

15 the process according to the invention, 

• Figure 6 illustrates a functional description of 
the process according to the invention, 

• Figure 7 illustrates the block diagram of the 
entire process according to the invention, 

20 • Figure 8 illustrates the error confinement area 

at hardware level, 

• Figure 9 illustrates the sequence of the process 
according to the invention and swapping of the context 
tables , 

25 • Figure 10 illustrates the vote on the data, 

• Figures 11A and 11B illustrate the structure of 
the vote for the process according to the invention, 
including the different "soft crash" type sequencing 
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errors and the structure of the vote / command 
generation procedure. 

Detailed description of particular embodiments 
5 A process according to the invention for a space 

application is considered as an example throughout the 
rest of this description. 

A typical and generic application of a computer 
used in space is provided below, considering hardware 
10 and software points of view. The reference 

architecture illustrated in FIGURE 2 is used as a basis 
for the description of the process according to the 
invention. 

The onboard management unit 10 illustrated in 
15 FIGURE 2 comprises: 

- a control unit 11 made around a microprocessor, 

- a mass memory 12, 

- power interfaces 13, pay load interfaces 15, 
pyrotechnics interfaces 16, thermal interfaces 17, 

20 attitude and orbit control system interfaces 18, 

- connected through a data bus 19, 

- a remote control-remote measurement interface 

14, 

- monitoring and reconfiguration electronics 20, 

25 - DC-DC converters 21 producing switched power 

supplies AC and permanent power supplies AP. 

The power interface 13 is connected to a solar 
generator 25 and to a battery 26. 
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The remote control-remote measurement interface 14 
is connected to a transmitter / receiver, a duplexer 2 7 
connected to antennas 28 and 29. 

The pay load 31 is connected to the control unit 11 
5 through an avionics bus 32, to the mass memory 12 and 
to the remote control / remote measurement interface 14 
through a high speed serial link 33, to the pay load 
interface 15. 

The pyrotechnics interface 16 is connected to 
10 deployable systems 35. 

The thermal interface 17 is connected to heaters 
and thermistances 36. 

The attitude and orbit control system interface is 

connected to sensors CI, C2 , , Cn, to actuators Al, 

15 A2, Am, and to a reservoir pressure sensor 37. 

Therefore, this type of architecture is composed 
of the different processing modules (control unit 
module), and input / output modules (acquisition 
modules, control modules). Input / output modules 
2 0 include low level electronics (analog / digital 
converter or digital / analog converter, digital or 
analog channel multiplexers, relays, etc.). 

Modules may indifferently be boards connected by a 
back panel bus, or complete boxes connected through an 
25 avionics bus. In both cases, the interface to the bus 
is made through a master Bus Coupler (CB) onto the 
control unit module, and by subscriber bus couplers 
onto the other modules. 

The reference software architecture as illustrated 
30 in figure 3 is composed of processing tasks (for 
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example the attitude and orbit control system task, 
thermal control task, real time clock task, onboard 
management task, etc.)/ each task generating results 
that must be output from the computer (controls or 
5 commands), these results being generated (i.e. output 
from the computer) as they are calculated. 
Acquisitions (or Acq) are grouped at the beginning of 
the real time cycle due to time consistency (for 
example attitude and orbit control system) . 

10 In FIGURE 3, tasks A, B and C are shown at the 

same frequency for clarity of the description. 

The activity of these tasks is carried out at a 
real time cycle rate triggered by a cyclic Real Time 
Interrupt (IT-TR). This cycle starts some tasks 

15 cyclically, and these tasks operate either at the same 
frequency as the real time cycle, or at a sub-frequency 
of it. Other tasks are asynchronous and are 

initialised on events. 

This representation shows the reference hardware 

20 and software architecture and is supplied in FIGURE 4. 
This figure shows the control unit 40, the acquisition 
electronics 41 connected to sensors 42 and the control 
electronics 43 connected to actuators 44, these two 
electronics 41 and 4 3 and the control unit being 

25 connected to data bus 45. 

The sequencing of the three main phases Phi, Ph2 
and Ph3 (namely data acquisition, data processing and 
generation of commands ) involves the three separate 
parts of the electronics 40, 41, 43, with phases Ph2 

30 and Ph3 being nested. 
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The hardware part of this architecture is based 
only on functional blocks and therefore ignores the 
specific nature of particular components and their 
capacities (if any) in error detection / correction. 
5 Therefore the process according to the invention is 
self-sufficient- However, the use of any fault 

tolerant mechanisms integrated into the components used 
for a given application can only improve the error 
coverage ratio compared with the process according to 
10 the invention alone. 

Potential error signatures of the reference 
architecture subject to singular events was determined. 
The result was that errors could be grouped into two 
essential classes: 
15 - data errors, 

- sequencing errors that may also be shared into 
sub-classes : 

• "soft crash": incorrect connection, but the 
microprocessor can come back into phase with the 

20 instructions and continue sequencing of instructions 
more or less erratically; 

• "hard crash": the microprocessor is no longer 
operational; for example, the microprocessor is no 
longer in phase with the instructions, the 

25 microprocessor loads data into the instruction 
register, the stack pointer is disturbed, instruction 
sequencing is blocked, waiting for an impossible event, 
infinite loop, et. 
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These two classes are themselves sub-divided into 
several sub-classes, the most important concerning 
address errors . 

The distinction between a "soft crash" and a "hard 
5 crash" is important: although a hardware device 
external to the microprocessor is usually necessary to 
detect "hard crashes" (i.e. a watchdog), a software 
device may be sufficient to detect a "soft crash" since 
the microprocessor continues to execute code in the 
10 case of a software crash, even if it is erratically. 

Furthermore, microprocessor crashes form a 
critical error class since an "uncontrolled 
microprocessor" is capable of actions that could have 
catastrophic consequences for a space mission; 
15 therefore it is important to make every attempt to 
detect them with a short latency time, and / or to 
produce error confinement areas in order to minimise 
the probability of bad commands following an undetected 
error. 

20 We will now describe operation of the process 

according to the invention itself. 

Globally, the granularity used for detection / 
correction is the basic real time cycle of the 
computer, for example the cycle of the attitude and 

25 orbit control system task in a platform computer. 

The objective in the process according to the 
invention (as in a structural duplex) is to allow the 
computer to work without being monitored, and then to 
choose or "vote" only the data that are to be output 
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from the computer ( the commands ) , or the data that are 
used for correction (the context). 

There are several advantages in choosing the real 
time cycle for the granularity: 
5 - this is the frequency at which the acquisition is 

accessed or at which most sensors / actuators are 
controlled; 

- a fairly restricted number of "active" data are 
available at the end of the real time cycle; there is 
10 not a large quantity of intermediate data, and no local 
variables being used; 

• for detection, they are stored in a set of 
tables that are voted, 

♦ a simple and well-located restart context is 
15 available for correction. 

More precisely, the detection / correction 
granularity for a given task is the frequency of this 
task, since the vote is made at the end of the task. 
Consequently, if we consider an attitude and orbit 

20 control system task at 10 Hz and a thermal task at 1 
Hz, the granularity is 10 Hz for the attitude and orbit 
control system and 1 Hz for the thermal task. For 
reasons of clarity, the "granularity by real time 
cycle" will be used in the rest of the document rather 

25 than "by task". 

In order to benefit from the efficiency of the 
duplex (two identical systems in parallel executing the 
same software with a comparison of the outputs) that is 
a means of detecting all errors without exceptions 

30 regardless of their type (data error, address error, 
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sequencing error, configuration error, etc.) while 
eliminating structural redundancy, the process 
according to the invention consists of installing a 
duplex operation on a single physical sequence. In a 
5 given real time cycle, two virtual sequences located on 
the same physical sequence are multiplexed in time; 
the data generated from each execution of a virtual 
sequence are stored in "time multiplexed duplex tables" 
(for example commands, context) so that they can be 

10 voted before use. 

After a detection, the correction consists of 
inhibiting the current real time cycle and reloading a 
healthy context to perform a restart that consists of a 
nominal execution of the next cycle starting from the 

15 reloaded context; everything happens as if there were 
a "hole" in the real time cycle. 

The process according to the invention is based on 
the fact that an error generated by a singular event is 
transient; this type of error occurring during 

20 execution of the first virtual sequence is not 
reproduced during execution of the second system (and 
vice versa). On the other hand, the process according 
to the invention cannot detect static errors; for 
example component failures (stuck bit, etc.), or even 

25 some errors due to singular events and that would cause 
a permanent error ( for example blocking of a 
sequencer) . 

Actions on the output side of the vote module, in 
other words data transfers to the control electronics 
30 (i.e. the data bus) and the control electronics itself, 
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are not protected by the process according to the 
invention. The user makes a system analysis to 
determine these critical commands that need to be 
error- free and to protect them by mechanisms well known 
5 to an expert in the subject; coding of data, self- 
checking circuit, instrumentation of the control 
electronics, etc. 

Figure 6 contains a functional description of the 
process according to the invention. 

10 This figure shows the data flows (bold lines) by 

which the virtual sequence #1 47 receives acquisitions 
#1, the virtual sequence #2 48 receives acquisitions 
#2, the secure voter 49 receives outputs from these two 
sequences 47 and 4 8 and issues commands. There is also 

15 the error signal (thin lines) that connects the secure 
vote module 49 to the two virtual sequences 4 7 and 4 8 
in order to make a reload context request in order to 
initialise a restart for a correction. 

The global sequence of the process according to 

20 the invention is illustrated in FIGURES 5A and 5B: 

- Figure 5A: sequence without the process 
according to the invention, the commands nevertheless 
being grouped at the end of the processing, 

- Figure 5B: sequence with the process according 
25 to the invention. 

Figure 5A illustrates two real time cycles N and 
N+l, and the beginning of cycle N+2 . 

Each real time cycle is composed of 4 phases 
distinct in time: 
30 - data acquisition, 
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- processing with calculation of the commands, 

- transmission of commands, 

- scrub and standby task, in this case called 
"Scrub + Standby". 

5 Figure 5B illustrates the process according to the 

invention in which each real time cycle is composed of: 

- acquisition of virtual sequence #1 (ChV #1), 

- processing of virtual sequence #1; the results 
being stored in a table TAB #1, 

10 - acquisition of virtual sequence #2 (ChV #2), 

- processing of virtual sequence #2; the results 
being stored in a table TAB #2, 

- vote of tables TAB #1 and TAB #2, 

- generation of commands, 
15 - scrub and standby task. 

FIGURE 7 illustrates a mimic diagram of the entire 
process according to the invention, presenting all 
circuits necessary for embodiment of the invention. 

A first microprocessor module 50 manages all 
20 software mechanisms and in particular: 

- time duplication of tasks, 

- putting variables in tables, 

- the secure vote, 

- correction by restart, 

25 - management of hardware mechanisms. 

A memory access monitoring and time validation 
window module 51 is connected to the bus 52 of the 
microprocessor 50, an error detection and correction 
memory 53 and a bus coupler 54. 
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The module 51 generates an error signal on the 
"Reset" terminal of the microprocessor 50, a selection 
signal ("chip-select", CS) on memory 53, and a 
selection signal on the bus coupler 54 . 
5 The memory 53 is shared in segments, each segment 

having a specific access right (validation by keys). 
The bus coupler 54 that is connected to a data bus 55 
providing access to other computer functions 
(acquisition electronics, control electronics, etc.) is 
10 validated by a "time window" type signal. 

The process according to the invention is thus 
based on the following characteristics: 

- three error confinement areas (time, software 
and hardware ) , 

15 - putting variables / data into tables; 

- time duplication of processing, 
*9 - a unique secure software vote module enabling 

error detection by comparison of the results of each 
processing, the vote module also generating commands, 
20 - a software monitoring processor that 

participates in checking the integrity of the vote, 

- a control unit memory plane protected against 
singular events by error detection and correction code, 

- memory segmentation associated with a hardware 
25 access rights control device that, with the previous 

element, is used to reliably backup the restart context 
and detect addressing errors, 

- a check of transfers to the control electronics 
through the data bus, through a hardware device 
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controlling the access right, thus delimiting an error 
confinement area, 

- correction by restart if an error occurs. 
We will now describe each of these characteristics 
5 in turn. 



Confinement areas 

The largest error confinement area 60 is composed 
of acquisition electronics 41 and the control unit 40 
,3 10 as illustrated in FIGURE 8; this figure uses the same 

references as FIGURE 4. Thus, if an error disturbs 
^ acquisitions or processing, this error cannot be passed 

y onto the control electronics 43. Therefore errors 

occurring subsequent to a singular event in. the 
= 15 acquisition electronics 41 or in the control unit 40 

i; 2 will not generate any bad satellite commands and will 

Q not disturb the mission. 

This confinement area 60, due to the vote, is 
effective for errors that the vote module is capable of 
2 0 detecting. This confinement area is also almost 
impervious to other errors, due to the presence of an 
access rights check; the hardware device in the time 
validation window blocks unauthorized generation of 
commands on the bus . 
25 Furthermore, other confinement areas are defined 

in the process according to the invention: 

- time confinement of errors by real time cycle 
since the correction is based on the granularity of a 
real time cycle, 
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- confinement of errors by software task due to 
the memory access monitoring device, and by virtual 
sequences also due to monitoring of memory accesses. 

5 Process tables 

Operation of the process according to the 
invention is based on a set of tables which, in 
particular, contain data to be voted (therefore these 
tables are duplicated, one set being managed by ChV #1, 
10 the other set being managed by ChV #2). These tables 
are called "time multiplexed duplex tables" since they 
are specific to duplex operations, unlike usual 
software tables. For example: 

- acquisition table (TAB-Acq), 
15 - control table (TAB-Cde), 

- context table (TAB-Ctxt). 

Each of the context tables TAB-Ctxt #1 and #2 is 
actually composed of a set of two tables that work by 
swapping over one cycle out of every two at the end of 
20 the task (i.e. at the end of the vote) to enable 
restoring the context when making a correction by 
restart. A set of two indexes (the "Old" and the 
"New" ) are stored in memory and are associated with 
them. 

25 Thus for example for virtual sequence 1, during 

cycle #N in task K, the first table may be considered 
as "New" and is denoted TAB-Ctxt-New #1, the second 
table is "old" and is denoted TAB-Ctxt-Old #1. If the 
vote for task K does not detect any errors, the swap 

30 inverts the roles at the end of the vote; the first 
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table then becomes TAB-Ctxt-Old #1, and the second 
table becomes TAB-Ctxt-New #1- 

The "Old" areas are prohibited in write due to the 
memory access monitoring device, for example to protect 
5 the restart context from a microprocessor crash once it 
has been voted and judged to be sound. 

Sequencing - time duplication 

Compared with the reference software architecture 

10 defined above, the process according to the invention 
imposes that . commands are not generated as they are 
calculated as shown in FIGURE 3, but rather that they 
should be stored in a table waiting for a vote ( see 
FIGURE 5B). When the processing is terminated for the 

15 two virtual sequences, the tables are voted and the 
vote module generates the commands only if no 
inconsistencies are detected; in this case, one of the 
two command tables is sent to the control unit 4 0 
through the control electronic 43 through the bus 45. 

2 0 For a given task, the global sequence of the 

process according to the invention, including the time 
duplication aspect, is described below with reference 
to FIGURE 9. 

This figure illustrates the sequence of the time 
25 multiplexed duplex according to the invention, and the 
swap of the context tables . 

Real time cycles are initialised^ by real time 
interrupts IT-TR. 

Each real time cycle is composed of the following 
30 phases: 
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- sequencer / real time executive (ETR) , 

- task A, 

- sequencer / real time executive, 

- task B, 

5 - sequencer / real time executive, 

- task C, 

- sequencer / real time executive, 

- scrub and standby task. 

Task A, and tasks B and C, consist of the 
10 following as illustrated in FIGURE 5B: 

- virtual sequence number 1 acquisition, 

- virtual sequence number 1 processing, 

- virtual sequence number 2 acquisition, 

- virtual sequence number 2 processing, 
15 - vote and generation of commands. 

There are the following steps: 

• During the "processing" module for virtual 
sequence No. 1 (ChV#l): 

- acquisition of data for ChV#l and storage in 
20 TAB-Acq#ll; 

- execution of the processing associated with 
ChV#l starting from TAB-Acq#l and TAB-Ctxt-01d# 1 ; the 
results of this processing are stored in the TAB-Cde#l 
and TAB-Ctxt-New#l tables; no command is generated by 

25 the control unit to be sent to actuators. 

• During the "processing" module for virtual 
sequence number 2 (ChV#2): 

- acquisition of data for ChV#2 and storage in 
TAB-Acq#2; 
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- execution of the processing associated with 
ChV#2 starting from TAB -Acq #2 and TAB-Ctxt-01d#2 ; the 
results of this processing are stored in the TAB-Cde#2 
and TAB-Ctxt-New#2 tables; the computer does not 

5 generate any output. 

• During the "Vote and generate commands" module, 
in other words the comparison of tables and execution 
of actions related to the tables: 

- word by word comparison of TAB— Cde# 1 and TAB- 
10 Cde#2, 

- word by word comparison of TAB-Ctxt-New No. 1 
and No . 2 , 

- if no errors are detected, the process 
continues; otherwise, put on standby, 

15 - swap the context tables by changing the index: 

TAB-Ctxt-New replaces TAB-Ctxt-old and is used as the 
context for the next real time cycle, 

- generate commands: one of the two TAB-Cde 
tables is emptied sequentially to transfer command 

2 0 requests to the command electronics through the data 
bus , 

- initialisation of time multiplexed duplex 
parameters (time multiplexed duplex tables, time 
multiplexed duplex management variables). 

25 Thus in FIGURE 9, during the real time cycle N+l, 

if no errors were detected during the real time cycle 
N, the entry context to task A is TAB-Ctext-Old(N) , 
this table actually containing the data from TAB-Ctxt- 
New(N) due to the swap; if errors were detected, the 

30 entry context to task A is TAB-Ctxt-01d(N-l ) , this 



B13049 DB 



35 



table being identical to table TAB-Ctxt-01d(N-l ) in the 
real time cycle N since context switching does not take 
place in the case of an error. 

During processing modules, a checksum code is 
calculated for each of the tables that will be 
submitted to the vote; it participates in checking the 
exhaustiveness (integrity) of the vote. 

Secure software vote 

As a minimum, the data that need to be voted are 
the various outputs from the processing module as 
illustrated in FIGURE 10, namely: 

- data 65 output from the computer (the commands) 
to not generate incorrect actuations, 

- data 66 used for the restart (context - if cycle 
N is in fault, the healthy data N-l will be restored 
for the restart), since the restart context needs to be 
healthy. 

Thus, all data output from the processing module 
are voted. The vote associated with the given task is 
made at the end of this task, as defined by the 
detection / correction granularity. 

Concerning the vote module, the process according 
to the invention can give the following 
characteristics : 

- No need to use an external component tolerant to 
singular events if the vote security devices can be 
defined; the architecture is thus simplified; the 
vote can be made by the microprocessor itself entirely 
in software, with support from the few hardware devices 
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necessary elsewhere in the process according to the 
invention . 

- The software vote is not duplicated. 

The process according to the invention is based on 
5 the best use of the detection capacities of the duplex 
architecture, in other words capable of detecting all 
error types including sequencing errors that are the 
most difficult to detect and also potentially have the 
most serious consequences. These errors have an impact 
10 on the consistency of time multiplexed duplex tables; 
therefore, they are detected by a software vote 
provided that the software vote is secure, in other 
words that it cannot be triggered by a microprocessor 
operating incorrectly. Therefore, appropriate devices 
15 must be provided to ensure that the vote is correct. 

Two central elements are provided to ensure that 
the vote is secure: 

- check that the microprocessor and the control 
unit module are in healthy state at the beginning of 

20 the vote, 

- check that the vote is complete while the vote 
is being made, in order to authorise generation of 
commands . 

The vote structure is defined as a function of the 
25 analysis of the possibilities of bad connections of a 
microprocessor affected by a "soft" crash; FIGURE 11A 
illustrates the various possible "soft crash" type 
sequencing errors . 

"Hard crashes" are handled by a watchdog, which is 
30 the method usually used by an expert in the subject. 
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The following structure is used for the "Software 
vote and generate commands" module associated with task 
K, and is illustrated in FIGURE 11B: 

a) check the state of the microprocessor connected 
5 at the beginning of the vote and the state of the 

control unit module: check that the stack pointer is 
within the authorised area, check the microprocessor 
and the control unit card configuration registers; 

b) inhibit caches if possible to minimise the 
10 probability of an error during the vote; 

c) check that a Vote-Key variable is equal to 0 f 
and then set it to 1 (i.e. vote); this variable is a 
key that is used to globally check correct sequencing 
of the microprocessor using a Software Monitoring 

15 process; 

d) activate the key for the memory access 
onitoring device indicating that voting is being done 
and authorising simultaneous access to the two memory 
areas ChV#l and ChV#2; 

2 0 e) vote on all tables produced by the time 

multiplexed duplex and calculate a cyclic redundancy 
code calculated during the CRC-V votes, as the vote is 
made, for each table: TAB-Cde, TAB-Ctxt-New; 

f) check that Vote-Key is equal to 1, then set it 
25 to 2 (i.e. generate commands); 

g) compare CRC-V with the cyclic redundant codes 
calculated during CRC-T processing; 

h) inhibit the memory access monitoring device key 
indicating that a vote is being made; 
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i) if the results of tests e, f and g are correct , 
then open a bus coupler time validation window using 
the time validation window system; 

j ) reinitialise the command card configuration 
5 registers; 

k) generate commands to the bus coupler; 

1) check command card configuration registers, 
and take action as a function of the error type 
detected (usually resend the command); 
10 the time window is terminated, or will terminate; 

m) check that Vote-Key is equal to 2, then set 
it to 3 (i.e. switching and initialisation); 

n) swap the context tables for task K by inverting 
the pair of "old" and "New" indexes stored in memory; 
15 o) initialise all tables in task K apart from the 

"Old" tables, with l's complement values between ChV#l 
and ChV#2; 

p) transfer "Old" tables to "New" tables and vote 
to check this transfer; this transfer is necessary to 
20 make sure that the variables are valid in the long 
term, if they are not systematically updated each time 
that the task is executed; 

q) check that Vote-Key is equal to 3, then set to 
0 (i.e. inhibited); 
25 r) validate caches. 

The "vote / generate commands / switch / 
initialise time multiplexed duplex tables" procedure 
cannot be interrupted, i.e. it must not be stopped by a 
higher priority task (it must be terminated once it has 
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been initialised). However, an interrupt can 

temporarily interrupt this module. 

Software monitoring processor 
5 A Software Monitoring Process known to an expert 

in the subject is a means of handing over control of 
the correct sequence of microprocessor instructions to 
the software itself. The software is broken down into 
linear elementary segments, in other words segments 

10 between two connections. Since linear segments do not 
themselves contain branching instructions, once the 
microprocessor has executed the first instruction in 
this segment, it must continue until the last 
instruction in this segment has been executed. 

15 it is checked that the microprocessor has actually 

entered a linear segment at its exact entry point and 
not elsewhere, by testing a key at its entry point and 
checking it at its exit point. 

The vote security uses a number of means including 

20 this Software Monitoring process; the "Vote-Key" 
software variable being a key associated with this 
process. This vote module is partitioned into three 
functional segments (vote, generate commands, switching 
and initialisation), in a way the rest of the 

25 application software representing a fourth segment: 

- the value of the Software Monitoring process key 
is checked at the beginning of each segment, to make 
sure that the microprocessor actually exited from the 
previous segment exactly at its exit point and not 
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elsewhere, and it is also checked that the segments are 
correctly chained in sequence with each other, 

- then, the Software Monitoring process key is set 
to a value corresponding to the current segment 

5 immediately after this check at the beginning of the 
segment, 

- at the end of each segment, it is checked that 
the microprocessor actually entered the current segment 
at its exact entry point and not elsewhere. 

10 

Protection of the memory plane 

Memory planes are conventionally protected against 
singular events by a correction code (EDAC) and a scrub 
task to read the entire memory plane to detect and 
15 correct dormant errors. This is necessary because 
multiple errors on the same word can no longer be 
detected and / or corrected. 

The process according to the invention is based 

on : 

20 - a reliable memory related to singular events due 

to the use of a correction code (EDAC); 

- a reliable memory for incorrect writes following 
an address error, an instruction error, a 
microprocessor crash, etc., by monitoring access 

25 rights. 

Memory access monitoring device 

The Memory Access Monitoring (SAM) device is a 
hardware device derived from conventional block memory 
30 protection units. It is used to check that a 
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microprocessor that attempts to access a delimited 
memory area actually has access rights to this area. 

The memory access monitoring device can detect 
most address errors. In particular, it can very 
quickly detect many microprocessor crashes. A 
microprocessor can frequently go outside the allowable 
address area after a "soft" crash. 

The memory access monitoring device has some 
special features compared with a conventional block 
memory protection unit: 

- the size of the segments is arbitrary, and is 
defined as a function of the applications, 

- the access authorisation is made by programming 
keys memorised in registers internal to the memory 
access monitoring device, the definition and 
combination of these keys being specific to the process 
according to the invention. 

The following is a list of keys integrated in the 
memory access monitoring device: 

- Key preventing write access to the area 
memorising the code, since a code error would be an 
error mode common to the two virtual sequences and 
would not be detected by the vote. This key authorises 
writing to memory only during initialisation of the 
computer, when the code in read only memory is 
transferred into RAM. 

- Key indicating which virtual sequence is 
current, ChV#l or ChV#2, and preventing the 
microprocessor from accessing the memory area 
containing the ChV#2 tables when ChV#l is being 
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executed (and vice versa). This key enables one 
virtual sequence to be made impervious to errors 
occurring on the other, 

- Key indicating that a vote is currently being 
5 made; when this key is active, it exceptionally 

enables the microprocessor to simultaneously access the 
two areas ChV#l and #2 so that the vote can be made. 

- Key indicating which is the current task, and 
allowing the microprocessor to access only the memory 

10 area containing the tables for this software task 
currently being executed. This key enables one task to 
be made impervious to errors occurring in the other 
tasks . 

- Key indicating which of the two "Old" / "New" 
15 table sets working in swap are the "Old" areas and the 

"New" areas, write being prohibited in the "Old" areas. 

Time validation window system 

The Time Validation Window { FVT ) system is an 
2 0 innovative hardware device. It is made using a 
conventional time counter. It confines hardware 

errors. It is designed to: 

- prevent the crashed microprocessor that would 
execute the command electronics management code, from 

25 generating a command without having correctly 
acknowledged an access right; 

- prohibit a microprocessor that would execute an 
incorrect "write to address corresponding to a command" 
type instruction from accidentally generating a 

30 command. 
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Therefore, the time validation window device 
protects the system from accidental commands, with 
potentially catastrophic consequences that they could 
have for the application. It is armed in advance to 
5 authorise access to the controlled electronics; a time 
access validation window is opened. 

In the case of errors mentioned above, the 
microprocessor does not access the command electronics 
by executing the interface procedure exhaustively; 
10 consequently, unauthorised access is immediately 
detected by this system since the microprocessor has 
not previously opened the time validation window. 

The time validation window device is armed after 
having made the decision that there are no errors 
15 present. This decision is based firstly on checking 
the healthy state of the microprocessor and the control 
unit (at the beginning of the vote, then with the 
"Vote-Key" variable and the "Checksum" during the 
vote), and secondly on the result of the vote. 

20 

Correction 

The correction is executed according to the 
following sequence: 

- when an error is detected, the current real time 
25 cycle (number N) is inhibited and no command is 

generated; the microprocessor goes to standby mode 
while waiting for the next real time cycle, 

- the next real time cycle N+l is executed from 
the previous context N-l, and not from context N which 
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is no longer reliable, and from acquisitions for the 
current cycle N+l. 

The incorrect real time cycle is not replayed, all 
that is done is to inhibit the current real time cycle 
5 and restore the context of the previous cycle. If an 
error occurs, the microprocessor does not generate 
commands for the current real time cycle since it is 
put on standby; everything happens as if there were a 
"hole" in the real time cycle. 

10 The correction does not require any specific 

actions; the microprocessor is put on standby after a 
detection, consequently it does not continue execution 
of the vote module. This naturally prevents swapping 
of the "Old" and "New" contexts, which takes place at 

15 the end of the vote module. 

Given the transient nature of errors detected by 
the process, a single restart attempt is made. If this 
attempt is not successful, the computer would have to 
be completely reinitialised. 

20 

Sequencer - Real time executive 

The sequencer, or the real time executive, that 
enables sequencing of software tasks, is not directly 
protected. The objective is to use a commercially- 

25 available executive, and therefore not to make any 
changes in it to include fault tolerant mechanisms. 

On the other hand, the execution time dedicated to 
these tasks compared with the total execution time is 
very small. Consequently, task scheduling errors are 

30 modes common to the two virtual sequences and are non- 
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detectable, but have a very small impact on the 
coverage ratio. 

Furthermore, confinement areas are capable of 
blocking some undetected errors before a bad command is 
initialised, thus reducing the impact of errors 
originating from the scheduler. 

Variant embodiments 

Variants to the process according to the invention 
are possible, particularly by simplifying some of its 
characteristics, for example. 

- Simplification of vote security mechanisms: 
elimination of the checksum calculation, the check by 
the software monitoring process being considered to be 
sufficient. 

- Simplification of the memory access monitoring 
device: no impervious partitions between ChV#l and 
ChV#2 (elimination of the key indicating the current 
virtual sequence), since the probability of identical 
errors between two sequences is a priori very small. 

Development and embodiment of the process according to 
the invention 

The process according to the invention was 
developed in order to make the most generic and the 
most exhaustive possible validation, and to measure the 
maximum possible error coverage rate. 

Objective 
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The objective is to have a hardware and software 
embodiment (mock-up) representative of a typical space 
application, in order to validate the process once and 
for all- A space project actually analyses the various 
5 possible solutions in the preliminary phases. For new 
solutions, a mock-up is used in an attempt to 
demonstrate correct operation and suitability to the 
need, which creates significant delays before the 
project team can make a decision about its use. 

10 Consequently, before this phase, a generic 

validation is undertaken in order to provide a complete 
file to any interested project, including requirement 
specifications , implementation specifications , 

implementation files, validation results, results of 

15 recovery rate measurements, etc. 

Thus in the preliminary phase, all projects can 
have the complete development / validation file for 
this process without the need to redevelop a mock-up. 
Consequently (for example through an audit) the 

20 suitability of the process to satisfy the needs of the 
project can be determined quickly to make a decision 
about its selection. 

Validation method 
25 The process is validated by the injection of 

faults. There are thus two types of injection with 

separate objectives. 

- In the first phase, deterministic errors are 

injected by software. Since this injection is 

30 synchronous, error scenarios can be replayed when the 
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process is in fault. This phase can thus be used to 
validate the process, and possibly to modify it to 
improve its error detection / correction performances. 

- In a second phase, random errors are injected by 
5 applying a particle beam to the main components of the 
embodiment (heavy ions, protons), by using an particle 
accelerator. This phase is complementary to the 
previous phase, and enables an end to end validation 
since the injected error spectrum is wider. 
10 Furthermore, since the distribution of errors is 
representative of a real application environment, it 
can make an accurate measurement of the error coverage 
ratio of the process. 

15 Hardware embodiment 

The developed hardware embodiment is composed 
mainly of three parts; the processing unit, the 
acquisition unit and the observability unit. 

The processing unit is developed around a Power 

20 PC 603e type microprocessor and its memory, and a 
programmable component integrating all hardware 
mechanisms of the process. 

The acquisition unit simulates several acquisition 
channels for the microprocessor, each of these channels 

25 having particular characteristics: acquisitions made 
at the request of the microprocessor (simulation of 
simple sensors), acquisitions made cyclically and that 
the microprocessor must read when they arrive 
(simulation of intelligent sensors such as a stellar 

30 sensor or a GPS), reception of remote commands, etc. 
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These acquisition channels are made around nano 
controllers of the PIC 16C73A type. 

The observability unit integrates the control- 
instrumentation of the embodiment (load the software, 
observability of the microprocessor, etc.)/ and a 
channel enabling simulation of actuations made by the 
microprocessor. Outputs on this channel are 

systematically checked to verify that there are no 
false actuations generated by the computer, despite the 
fact that it is affected by transient errors. 

Hardware embodiment 

The developed software application (i.e. the 
software embodiment) has the following features to make 
it as representative as possible of onboard real time 
applications, in space or in other applications: 

- sequencing based on a cyclic sequencer that will 
later be replaced by a commercial real time executive; 

- several main application tasks (for example six) 
with different priorities, some of them being cyclic 
and other asynchronous and aperiodic; one of the tasks 
being the core of a spacecraft attitude control 
program; 

- the application is based on real time cycles 
running under the control of a real time clock, the 
application tasks having different intervals; 

- several application tasks, for example three, 
are interrupted by higher priority tasks; 



B13049 DB 



• # 

49 



- the software must react in real time to external 
asynchronous events originating from the acquisition 
channels . 
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